{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Advanced RAG with LlamaIndex and Milvus\n",
    "\n",
    "This notebook walks your through an advanced Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex and Milvus.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Prerequisites\n",
    "\n",
    "### Installation\n",
    "First, you need to install the required packages for LlamaIndex and Milvus. The python version 3.11 is used in this sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR: Invalid requirement: \"'llama_index\": Expected package name at the start of dependency specifier\n",
      "    'llama_index\n",
      "    ^\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR: Invalid requirement: \"'milvus\": Expected package name at the start of dependency specifier\n",
      "    'milvus\n",
      "    ^\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR: Invalid requirement: \"'llama-index-vector-stores-milvus\": Expected package name at the start of dependency specifier\n",
      "    'llama-index-vector-stores-milvus\n",
      "    ^\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR: Invalid requirement: \"'llama-index-embeddings-openai\": Expected package name at the start of dependency specifier\n",
      "    'llama-index-embeddings-openai\n",
      "    ^\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "ERROR: Invalid requirement: \"'llama-index-llms-openai\": Expected package name at the start of dependency specifier\n",
      "    'llama-index-llms-openai\n",
      "    ^\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Active code page: 65001\n",
      "Looking in indexes: https://mirrors.aliyun.com/pypi/simple/\n",
      "Requirement already satisfied: python-dotenv in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (1.0.1)\n",
      "Requirement already satisfied: torch in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (2.6.0)\n",
      "Requirement already satisfied: sentence-transformers in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (3.4.1)\n",
      "Requirement already satisfied: filelock in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (3.17.0)\n",
      "Requirement already satisfied: typing-extensions>=4.10.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (4.12.2)\n",
      "Requirement already satisfied: networkx in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (3.4.2)\n",
      "Requirement already satisfied: jinja2 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (3.1.6)\n",
      "Requirement already satisfied: fsspec in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (2025.3.0)\n",
      "Requirement already satisfied: setuptools in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (75.8.0)\n",
      "Requirement already satisfied: sympy==1.13.1 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from torch) (1.13.1)\n",
      "Requirement already satisfied: mpmath<1.4,>=1.1.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sympy==1.13.1->torch) (1.3.0)\n",
      "Requirement already satisfied: transformers<5.0.0,>=4.41.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (4.49.0)\n",
      "Requirement already satisfied: tqdm in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (4.67.1)\n",
      "Requirement already satisfied: scikit-learn in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (1.6.1)\n",
      "Requirement already satisfied: scipy in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (1.15.2)\n",
      "Requirement already satisfied: huggingface-hub>=0.20.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (0.29.2)\n",
      "Requirement already satisfied: Pillow in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from sentence-transformers) (11.1.0)\n",
      "Requirement already satisfied: packaging>=20.9 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from huggingface-hub>=0.20.0->sentence-transformers) (24.2)\n",
      "Requirement already satisfied: pyyaml>=5.1 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from huggingface-hub>=0.20.0->sentence-transformers) (6.0.2)\n",
      "Requirement already satisfied: requests in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2.32.3)\n",
      "Requirement already satisfied: colorama in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from tqdm->sentence-transformers) (0.4.6)\n",
      "Requirement already satisfied: numpy>=1.17 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (1.26.4)\n",
      "Requirement already satisfied: regex!=2019.12.17 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (2024.11.6)\n",
      "Requirement already satisfied: tokenizers<0.22,>=0.21 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.21.0)\n",
      "Requirement already satisfied: safetensors>=0.4.1 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from transformers<5.0.0,>=4.41.0->sentence-transformers) (0.5.3)\n",
      "Requirement already satisfied: MarkupSafe>=2.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from jinja2->torch) (3.0.2)\n",
      "Requirement already satisfied: joblib>=1.2.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from scikit-learn->sentence-transformers) (1.4.2)\n",
      "Requirement already satisfied: threadpoolctl>=3.1.0 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from scikit-learn->sentence-transformers) (3.5.0)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (3.4.1)\n",
      "Requirement already satisfied: idna<4,>=2.5 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (3.10)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (2.3.0)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in z:\\condadir\\envs\\llamaindex\\lib\\site-packages (from requests->huggingface-hub>=0.20.0->sentence-transformers) (2025.1.31)\n",
      "LlamaIndex version: 0.10.36\n"
     ]
    }
   ],
   "source": [
    "# !pip3 install 'llama_index>=0.10.36'\n",
    "# !pip3 install 'milvus>=2.4.0' 'pymilvus>=2.4.0' openai\n",
    "# !pip3 install 'llama-index-vector-stores-milvus>=0.1.11'\n",
    "# !pip3 install 'llama-index-embeddings-openai>=0.1.9'\n",
    "# !pip3 install 'llama-index-llms-openai>=0.1.18'\n",
    "# !pip3 install python-dotenv torch sentence-transformers\n",
    "\n",
    "import llama_index\n",
    "from importlib.metadata import version\n",
    "\n",
    "print(f\"LlamaIndex version: {version('llama_index')}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Start Milvus Service\n",
    "\n",
    "There are 2 options to start a Milvus service:,\n",
    "- [Zilliz Cloud](https://zilliz.com/cloud): Zilliz provides cloud-native service for Milvus. It simplifies the process of deploying and scaling vector search applications by eliminating the need to create and maintain complex data infrastructure. [Get Started Free!](https://cloud.zilliz.com/signup)\n",
    "- [Open Source Milvus](https://milvus.io): You can install the open source Milvus using either Docker Compose or on Kubernetes.\n",
    "\n",
    "Here, we use [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to start with a lightweight version of Milvus, which works seamlessly with Google Colab and Jupyter Notebook\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#from milvus import default_server\n",
    "\n",
    "# default_server.cleanup()  # Optional, run this line if you want to cleanup previous data\n",
    "#default_server.start()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set your OpenAI API key\n",
    "\n",
    "This tutorial uses an embedding model and LLM from OpenAI, for which you will need an API key set as an evironment variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# import os\n",
    "# os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"\n",
    "# print(os.environ[\"OPENAI_API_KEY\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Define Embedding Model and LLM\n",
    "\n",
    "First, you can define an embedding model and LLM in a global settings object.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/anaconda3/lib/python3.11/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.5' currently installed).\n",
      "  from pandas.core import (\n"
     ]
    }
   ],
   "source": [
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core.settings import Settings\n",
    "\n",
    "Settings.llm = OpenAI(model=\"gpt-4-turbo\", temperature=0.1)\n",
    "Settings.embed_model = OpenAIEmbedding(model=\"text-embedding-3-small\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Load data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-05-14 20:20:45--  https://publicdataset.zillizcloud.com/milvus_doc.md\n",
      "Connecting to 127.0.0.1:1087... connected.\n",
      "Proxy request sent, awaiting response... 200 OK\n",
      "Length: 5153 (5.0K) [binary/octet-stream]\n",
      "Saving to: ‘data/milvus_doc.md’\n",
      "\n",
      "data/milvus_doc.md  100%[===================>]   5.03K  --.-KB/s    in 0s      \n",
      "\n",
      "2024-05-14 20:20:46 (328 MB/s) - ‘data/milvus_doc.md’ saved [5153/5153]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data'\n",
    "!wget 'https://publicdataset.zillizcloud.com/milvus_doc.md' -O 'data/milvus_doc.md'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: ebe47875-ace0-4a96-b853-0b74e4c6d163\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "# Load data\n",
    "documents = SimpleDirectoryReader(\n",
    "        input_files=[\"./data/milvus_doc.md\"]\n",
    ").load_data()\n",
    "\n",
    "print(\"Document ID:\", documents[0].doc_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Chunk documents into Nodes\n",
    "\n",
    "As the whole document is too large to fit into the context window of the LLM, you will need to partition it into smaller text chunks, which are called `Nodes` in LlamaIndex.\n",
    "\n",
    "With the `SentenceWindowNodeParser` each sentence is stored as a chunk together with a larger window of text surrounding the original sentence as metadata. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "53\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.node_parser import SentenceWindowNodeParser\n",
    "\n",
    "# Create the sentence window node parser \n",
    "node_parser = SentenceWindowNodeParser.from_defaults(\n",
    "    window_size=3,\n",
    "    window_metadata_key=\"window\",\n",
    "    original_text_metadata_key=\"original_text\",\n",
    ")\n",
    "\n",
    "# Extract nodes from documents\n",
    "nodes = node_parser.get_nodes_from_documents(documents)\n",
    "\n",
    "print(len(nodes))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Build the index\n",
    "\n",
    "You will build the index that stores all the external knowledge in a Milvus vector database.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024-05-14 20:21:18.260686: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
      "/usr/local/anaconda3/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.\n",
      "  warn(\"The installed version of bitsandbytes was compiled without GPU support. \"\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'NoneType' object has no attribute 'cadam32bit_grad_fp32'\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Sparse embedding function is not provided, using default.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a77593e94f134da3b13c8279f4577902",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 30 files:   0%|          | 0/30 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/anaconda3/lib/python3.11/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()\n",
      "  return self.fget.__get__(instance, owner)()\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.vector_stores.milvus import  MilvusVectorStore\n",
    "from llama_index.core import StorageContext\n",
    "\n",
    "vector_store = MilvusVectorStore(dim=1536, \n",
    "                                 uri=\"http://localhost:19530\",\n",
    "                                 collection_name='advance_rag',\n",
    "                                 overwrite=True,\n",
    "                                 enable_sparse=True,\n",
    "                                 hybrid_ranker=\"RRFRanker\",\n",
    "                                 hybrid_ranker_params={\"k\": 60})\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "index = VectorStoreIndex(\n",
    "    nodes, \n",
    "    storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5: Setup the Query Engine\n",
    "\n",
    "### Build the Metadata Replacement Post Processor\n",
    "In advanced RAG, you can use the `MetadataReplacementPostProcessor` to replace the sentence in each node with it’s surrounding context as part of the sentence-window-retrieval method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from llama_index.core.postprocessor import MetadataReplacementPostProcessor\n",
    "\n",
    "# The target key defaults to `window` to match the node_parser's default\n",
    "postproc = MetadataReplacementPostProcessor(\n",
    "    target_metadata_key=\"window\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add a re-ranker\n",
    "For advanced RAG, you can also add a re-ranker, which re-ranks the retrieved context for its relevance to the query. Note, that you should retrieve a larger number of `similarity_top_k`, which will be reduced to `top_n`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from llama_index.core.postprocessor import SentenceTransformerRerank\n",
    "\n",
    "# BAAI/bge-reranker-base is a cross-encoder model\n",
    "# link: https://huggingface.co/BAAI/bge-reranker-base\n",
    "rerank = SentenceTransformerRerank(\n",
    "    top_n = 3, \n",
    "    model = \"BAAI/bge-reranker-base\" \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, you can put all components together in the query engine! "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# The QueryEngine class is equipped with the generator and facilitates the retrieval and generation steps\n",
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k = 3, \n",
    "    vector_store_query_mode=\"hybrid\",  # Milvus starts supporting from version 2.4, use 'Default' for versions before 2.4\n",
    "    node_postprocessors = [postproc, rerank],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6: Run an Advanced RAG Query on Your Data\n",
    "Now, you can run advanced RAG queries on your data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Yes, users can delete Milvus entities through non-primary key filtering by using complex boolean expressions.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"Can user delete milvus entities through non-primary key filtering?\"\n",
    ")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Window: \n",
      "\n",
      "Delete Entities\n",
      "This topic describes how to delete entities in Milvus.\n",
      "\n",
      " Milvus supports deleting entities by primary key or complex boolean expressions.  Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions.  This is because Milvus executes queries first when deleting data by complex boolean expressions.\n",
      "\n",
      " Deleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\n",
      "\n",
      "------------------\n",
      "Original Sentence: Milvus supports deleting entities by primary key or complex boolean expressions. \n"
     ]
    }
   ],
   "source": [
    "window = response.source_nodes[0].node.metadata[\"window\"]\n",
    "sentence = response.source_nodes[0].node.metadata[\"original_text\"]\n",
    "\n",
    "print(f\"Window: {window}\")\n",
    "print(\"------------------\")\n",
    "print(f\"Original Sentence: {sentence}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# References\n",
    "* [Llamaindex docs: Milvus Vector Store](https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusIndexDemo.html)\n",
    "* [LlamaIndex docs: Metadata Replacement + Node Sentence Window](https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html)\n",
    "* [Advanced Retrieval-Augmented Generation: From Theory to LlamaIndex Implementation](https://towardsdatascience.com/advanced-retrieval-augmented-generation-from-theory-to-llamaindex-implementation-4de1464a9930)\n",
    "* [Milvus Vector Store With Hybrid Retrieval](https://docs.llamaindex.ai/en/stable/examples/vector_stores/MilvusHybridIndexDemo/)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
